Search CORE

27 research outputs found

TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark

Author: Boncz P.A. (Peter)
Erling O. (Orri)
Neumann T. (Thomas)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The TPC-D benchmark was developed almost 20 years ago, and even though its current existence as TPC H could be considered superseded by TPC-DS, one can still learn from it. We focus on the technical level, summarizing the challenges posed by the TPC-H workload as we now understand them, which w

VU Research Portal

CWI's Institutional Repository

S3G2: a Scalable Structure-correlated Social Graph Generator

Author: Boncz P.A. (Peter)
Erling O. (Orri)
Pham M.-D. (Minh-Duc)
Publication venue: CWI
Publication date: 01/06/2012
Field of study

Benchmarking graph-oriented database workloads and graph-oriented database systems are increasingly becoming relevant in analytical Big Data tasks, such as social network analysis. In graph data, structure is not mainly found inside the nodes, but especially in the way nodes happen to be connected, i.e. structural correlations. Because such structural correlations determine join fan-outs experienced by graph analysis algorithms and graph query executors, they are an essential, yet typically neglected, ingredient of synthetic graph generators. To address this, we present S3G2: a Scalable Structure-correlated Social Graph Generator. This graph generator creates a synthetic social graph, containing non-uniform value distributions and structural correlations, and is intended as a testbed for scalable graph analysis algorithms and graph database systems. We generalize the problem to decompose correlated graph generation in multiple passes that each focus on one so-called "correlation dimension"; each of which can be mapped to a MapReduce task. We show that using S3G2 can generate social graphs that (i) share well-known graph connectivity characteristics typically found in real social graphs (ii) contain certain plausible structural correlations that influence the performance of graph analysis algorithms and queries, and (iii) can be quickly generated at huge sizes on common cluster hardware

CWI's Institutional Repository

SMART-KG: Hybrid Shipping for SPARQL Querying on the Web

Author: Acosta Maribel
Aluç Güneş
Aranda Carlos Buil
Bonatti Piero Andrea
Buil-Aranda Carlos
Erling Orri
Hartig Olaf
Hasnain Ali
Heling Lars
Hernández-Illera A.
Martínez-Prieto M.A.
Meimaris M.
Polleres Axel
Saleem Muhammad
Publication venue: ACM Digital Library
Publication date: 01/01/2020
Field of study

While Linked Data (LD) provides standards for publishing (RDF) and (SPARQL) querying Knowledge Graphs (KGs) on the Web, serving, accessing and processing such open, decentralized KGs is often practically impossible, as query timeouts on publicly available SPARQL endpoints show. Alternative solutions such as Triple Pattern Fragments (TPF) attempt to tackle the problem of availability by pushing query processing workload to the client side, but suffer from unnecessary transfer of irrelevant data on complex queries with large intermediate results. In this paper we present smart-KG, a novel approach to share the load between servers and clients, while significantly reducing data transfer volume, by combining TPF with shipping compressed KG partitions. Our evaluations show that smart-KG outperforms state-of-the-art client-side solutions and increases server-side availability towards more cost-effective and balanced hosting of open and decentralized KGs

Crossref

KITopen

The LDBC Social Network Benchmark

Author: Angles Renzo
Antal János Benjamin
Averbuch Alex
Birler Altan
Boncz Peter
Búr Márton
Erling Orri
Gubichev Andrey
Haprian Vlad
Kaufmann Moritz
Marton József
Martínez Norbert
Paradies Marcus
Pey Josep Lluís Larriba
Pham Minh-Duc
Prat-Pérez Arnau
Spasić Mirko
Steer Benjamin A.
Szakállas Dávid
Szárnyas Gábor
Waudby Jack
Wu Mingxi
Zhang Yuchen
Publication venue
Publication date: 04/06/2022
Field of study

The Linked Data Benchmark Council's Social Network Benchmark (LDBC SNB) is an effort intended to test various functionalities of systems used for graph-like data management. For this, LDBC SNB uses the recognizable scenario of operating a social network, characterized by its graph-shaped data. LDBC SNB consists of two workloads that focus on different functionalities: the Interactive workload (interactive transactional queries) and the Business Intelligence workload (analytical queries). This document contains the definition of the Interactive Workload and the first draft of the Business Intelligence Workload. This includes a detailed explanation of the data used in the LDBC SNB benchmark, a detailed description for all queries, and instructions on how to generate the data and run the benchmark with the provided software.Comment: For the repository containing the source code of this technical report, see https://github.com/ldbc/ldbc_snb_doc

arXiv.org e-Print Archive

Experiences with Virtuoso Cluster RDF Column Store

Author: Boncz Peter
Erling Orri
Pham Minh-Duc
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2013
Field of study

CWI's Institutional Repository

RDFSync: efficient remote synchronization of RDF models

Author: Christian Morbidoni
Giovanni Tummarello
Orri Erling
Reto Bachmann-gmür
Publication venue
Publication date: 01/01/2007
Field of study

Abstract. In this paper we describe RDFSync, a methodology for efficient synchronization and merging of RDF models. RDFSync is based on decomposing a model into Minimum Self-Contained graphs (MSGs). After illustrating theory and deriving properties of MSGs, we show how a RDF model can be represented by a list of hashes of such information fragments. The synchronization procedure here described is based on the evaluation and remote comparison of these ordered lists. Experimental results show that the algorithm provides very significant savings on network traffic compared to the fileoriented synchronization of serialized RDF graphs. Finally, we provide the design and report the implementation of a protocol for executing the RDFSync algorithm over HTTP

CiteSeerX

IRIS UniversitÃ Politecnica delle Marche

The meaningful use of big data: Four perspectives - four challenges

Author: Bizer Christian
Boncz Peter
Brodie Michael L.
Erling Orri
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Twenty-five Semantic Web and Database researchers met at the 2011 STI Semantic Summit in Riga, Latvia July 6-8, 2011 to discuss the opportunities and challenges posed by Big Data for the Semantic Web, Semantic Technologies, and Database communities. The unanimous conclusion was that the greatest shared challenge was not only engineering Big Data, but also doing so meaningfully. The following are four expressions of that challenge from different perspectives

VU Research Portal

CWI's Institutional Repository

MAnnheim DOCument Server